Derivation and validation of novel integrated inpatient mortality prediction score for COVID-19 (IMPACT) using clinical, laboratory, and AI—processed radiological parameter upon admission: a multicentre study

Limited studies explore the use of AI for COVID-19 prognostication. This study investigates the relationship between AI-aided radiographic parameters, clinical and laboratory data, and mortality in hospitalized COVID-19 patients. We conducted a multicentre retrospective study. The derivation and validation cohort comprised of 512 and 137 confirmed COVID-19 patients, respectively. Variable selection for constructing an in-hospital mortality scoring model was performed using the least absolute shrinkage and selection operator, followed by logistic regression. The accuracy of the scoring model was assessed using the area under the receiver operating characteristic curve. The final model included eight variables: anosmia (OR: 0.280; 95%CI 0.095–0.826), dyspnoea (OR: 1.684; 95%CI 1.049–2.705), loss of consciousness (OR: 4.593; 95%CI 1.702–12.396), mean arterial pressure (OR: 0.928; 95%CI 0.900–0.957), peripheral oxygen saturation (OR: 0.981; 95%CI 0.967–0.996), neutrophil % (OR: 1.034; 95%CI 1.013–1.055), serum urea (OR: 1.018; 95%CI 1.010–1.026), affected lung area score (OR: 1.026; 95%CI 1.014–1.038). The Integrated Inpatient Mortality Prediction Score for COVID-19 (IMPACT) demonstrated a predictive value of 0.815 (95% CI 0.774–0.856) in the derivation cohort. Internal validation resulted in an AUROC of 0.770 (95% CI 0.661–0.879). Our study provides valuable evidence of the real-world application of AI in clinical settings. However, it is imperative to conduct prospective validation of our findings, preferably utilizing a control group and extending the application to broader populations.

More than 3 years since the first Coronavirus disease 2019 (COVID-19) appeared in Wuhan, China, and started a once-in-a-century pandemic 1 .While advancement in the treatment and prevention of severe COVID-19 disease has progressed rapidly, this respiratory viral disease remains an important source of worldwide morbidity and mortality 2 .A new wave of cases is still being reported due to viral mutation, partly due to antivirals and less efficacious vaccines, which selectively produce more resistant strains.Learning from the constraints caused by the COVID-19 waves, a clinical decision tool is essential for managing outbreaks of COVID-19 cases to help in triaging patients and thus preventing scarcity of hospital beds and medical resources 3 .
Numerous clinical decision tools have been created and published for the purpose mentioned above, such as the COVID-GRAM and the 4C Mortality score 4,5 .To the best of our knowledge, the clinical decision tools available today utilize clinical parameters and laboratory data only.It is understandable because these tools must be simple and practical and have adequate accuracy to be used clinically.Understandably, there are no clinical decision tools that incorporate radiographic AI parameter of COVID-19 patients.
The role of artificial intelligence (AI) in the medical field has expanded rapidly.Mainly, this role is limited to the purpose of screening and diagnosis.For example, in the field of pulmonology, AI-aided radiographic interpretation of chest X-ray (CXR) images proved to be sensitive and accurate for pulmonary tuberculosis screening 6 .The role of AI in COVID-19 diagnosis has also been reported.In one study, CAD4COVID-Xray (an AI software), through the color heatmap method, had a superior COVID-19 pneumonia diagnosis compared to six radiologists 7,8 .In contrast, the evidence on incorporating AI-aided radiographic interpretation of CXR for predicting clinical outcomes is scarce.
This study aimed to investigate the relationship between AI-aided radiographic parameter, clinical and laboratory data, and clinical outcomes in hospitalized COVID-19 patients with confirmed RT-PCR results.Additionally, we aimed to develop and validate a clinical risk tool known as Integrated Inpatient Mortality Prediction Score for COVID-19 (IMPACT) by integrating these data.

Study design
This was a retrospective cohort study using a secondary data from medical records and Picture Archiving and Communication System (PACS) chest radiography repositories.This study was conducted at three academic hospital, i.e., Airlangga University Hospital, located in Surabaya, East Java Province, Sardjito General Hospital, located in Jogjakarta, Special Region of Jogjakarta Province, and in dr.Cipto Mangunkusumo General Hospital, located in Centre Jakarta, Special Capital Region of Jakarta.The ethics committee of the respective hospitals approved the study.The requirement for written informed consent was exempted due to the utilization of anonymized historical data.(University of Gadjah Mada, University of Airlangga, and University of Indonesia IRB).All methods were performed in accordance with the relevant guidelines and regulations and adhered to Declaration of Helsinki.
Data obtained from the first two hospitals was used for the derivation of the novel scoring system.Data obtained from the latter hospital was used for the validation of the novel scoring system.
This study enrolled a cohort of adult patients (≥ 18 years old) hospitalized with COVID-19 cases between April 2020 and April 2022 for the derivation cohort.The validation cohort, however, included only hospitalized COVID-19 cases from April 2020 to April 2021.The reasons for this approach are twofold.Firstly, we utilized data from a separate study to constitute the validation cohort.Secondly, due to the expiration of our software permit, we were unable to access the CXR software required for additional analysis.As a result, we relied solely on the available data for analysis.
The decision to utilize chest X-ray rather than more advanced imaging modalities is driven by two primary considerations.First and foremost is Indonesia's classification as a low to middle-income country (LMIC).The accessibility of chest CT-scans is constrained, predominantly concentrated in major cities, reflecting the uneven

Outcomes
The primary outcome of this study was in-hospital mortality.The secondary outcome of this study was disease progression defined as at least one degree increment of disease severity (e.g., disease progression from moderate to severe disease).

AI system for chest X-ray interpretation
CAD4COVID-Xray software (Thirona, Nijmegen, Netherlands; https:// covid.cad4tb.care/ accou nts/ login/?next=/; based on CAD4TB ver. 6) was used for AI chest X-ray interpretation.The principal objective of the CAD4COVID-Xray software is to facilitate the triaging process in environments with limited resources and in areas with a high prevalence of COVID-19.This product holds a CE certification and employs the identical technical core utilized by CAD4TB, another CE-certified product registered by the FDA in Ghana.Consequently, CAD4COVID-Xray is developed to the same high-quality standard as CAD4TB, a standard substantiated by validation through over 40 academic publications.CAD4TB has been successfully deployed in 35 countries, playing a pivotal role in screening six million people globally 12 .Hence, its reliability, pertinence, and applicability have undergone rigorous validation, affirming that CAD4COVID-Xray is not only a dependable solution but also possesses relevance and generalizability across diverse healthcare scenarios.
The AI software relies on color heat-map method to detect parenchymal abnormality on chest X-rays (Fig. 1).The software executed a series of steps outlined below:  • Area Analysis: Determination of the proportion of lung parenchyma affected.

Filter Weight Determination
• The weights of each filter were determined, and an average filter weight was applied as a mask on the CXR picture.This created a color heat map visible only on the lung area previously segregated by the trained model.

Color Heat Map Representation:
• The color heat map exhibited various colors based on data weight: • High, medium, low, and extremely low probabilities of abnormality were represented by the colors red, yellow, green, and blue, respectively.

CAD4COVID Software Scoring:
• The CAD4COVID software received the digital CXR file and generated two AI scorings: • Affected Lung Area (ALA) Score: Calculated from the total lung volume with abnormalities found on the CXR, ranging from 0 to 100.A higher score indicates a greater impact on lung tissue.• COVID Probability Score: Determined by the average final weight of all layers, ranging from 0 to 100.
A higher score suggests a higher likelihood of COVID-19 occurrence.

Variable selection and establishing a scoring system
For the variable selection and scoring system derivation, we included all 512 hospitalized COVID-19 patients in the derivation cohort.In the selection process, we entered 66 variables.We applied the Least Absolute Shrinkage and Selection Operator (LASSO) regression with the purpose to minimize the potential collinearity of measured variables from the same patient and to prevent variables over-fitting.To deal with missing values, imputation was considered if the missing values were less than 25%.We used Multiple Imputation by Chained Equations (MICE) to impute numeric, binary, and factor variables 13 .In our multivariable analyses, we used least absolute shrinkage and selection regression with L1 penalization and tenfold cross-validation for internal validation 4 .
Based on its value, this logistic regression model imposes penalties on the absolute magnitudes of the regression coefficients.The estimates of weaker components are minimized towards zero by using greater penalties, leaving just the most significant predictors in the model.The factors that were shown to be the most predictive were those with the lowest value (min).The LASSO regression was carried out using the statistical program "glmnet" from the R Foundation.The risk score was then created using the consistently statistically significant factors that were included in logistic regression models after the variables found by LASSO regression analysis were included.(supplementary file).

Accuracy assessment
The IMPACT's score accuracy was assessed using the area under the receiver-operator characteristic curve (AUROC).Statistical analysis was performed with the IBM Statistical Program for Social Science (SPSS) for Macintosh, 27.0 (IBM Corp., Armonk, NY, USA) with statistical significant set at P < 0.05.

IMPACT score validation
The IMPACT score was validated using data from Dr. Cipto Mangunkusumo General Hospital (RSCM), which included a cohort of 137 patients.RSCM is a national referral hospital situated in Jakarta, the capital city of Indonesia.As a result, the baseline characteristics of hospitalized COVID-19 patients in this study were highly diverse, showcasing the ethnic and racial heterogeneity of Indonesia.The data collected from RSCM underwent meticulous scrutiny and verification by two physicians (AS and JH).This dataset was utilized to compute the IMPACT COVID-19 mortality risk score, as mentioned earlier, for the derivation cohort.

Characteristics of derivation cohort
In the derivation cohort, we included a total of 512 patients from two academic medical centres located in Surabaya, East Java Province, and the Special Region of Jogjakarta www.nature.com/scientificreports/Abnormal chest X-rays were identified in 412 (80.47%) patients, and further details regarding laboratory findings can be found in Table 2.

IMPACT mortality score construction
The IMPACT score was constructed based on the rounding of the beta coefficients acquired from the logistic model.The mortality risk score was developed by utilizing coefficients from the logistic model.We employed the following formulas in the logistic model to compute the probability and 95% confidence intervals 14

IMPACT mortality score performance
The predictive value for IMPACT score from the derivation cohort was 0.815 (95% CI 0.774-0.856),which was categorized as a model with a good predictive ability (Fig. 3 panel a).The AUROCs of IMPACT sore for Airlangga University Hospital and dr.Sardjito General Hospital were 0.839 (95%CI 0.768-0.910)and 0.793 (95%CI 0.741-0.845),respectively.

IMPACT mortality score validation
The internal validation of IMPACT score in the RSCM hospital yielded AUROC score of 0.770 (95%CI 0.661-0.879)with a fair predictive ability (Fig. 3 panel b).The baseline and clinical characteristics of the validation cohort are presented in Table 4.
As previously stated in the Methods section, the disparity in the hospitalization periods between the validation cohort and derivation cohort for COVID-19 patients could raise concerns about the validity of our findings.To address this issue, we conducted a stratified analysis of the AUROC of the IMPACT score based on the year of    score, derived from analysed chest X-rays (CXRs).The ALA score is generated by the CAD4COVID software, providing rapid results after uploading the image to the cloud-based system.While CT-scan is the preferred imaging modality for aiding COVID-19 diagnosis, it may not be accessible in many developing countries.On the other hand, CXR is readily available even in small hospitals.Additionally, logistical challenges such as patient transportation and the need for resuscitation in severe cases make CT-scan impractical 15 .
In the final model, several variables were identified as protective against in-hospital mortality due to COVID-19.These variables include mean arterial pressure (MAP), peripheral oxygen saturation (SpO2), and anosmia.Higher MAP and SpO2 as protective variables may indicate a milder course of COVID-19 16 .
Anosmia is considered a protective factor against in-hospital mortality of COVID-19 disease.A retrospective study involving 576 patients found that those with anosmia had higher levels of lymphocytes, haemoglobin, and GFR, and lower levels of D-dimer and CRP, indicating a milder immune and inflammatory response to SARS-CoV-2 infection 17 .Hendawy et al. also showed that anosmia is associated with mild chest infection 18 .
Conversely, our model identified several risk factors for mortality, including dyspnoea, loss of consciousness (LOC), higher neutrophil count, and higher serum urea levels.These factors are well-known indicators of poor prognosis, even in community-acquired pneumonia 19,20 .Significantly, elevated neutrophil levels contribute to COVID-19-associated coagulopathy by activating neutrophil extracellular traps (NETs) 21 .NETosis, the process where neutrophils release DNA structures to trap and kill pathogens, plays a crucial role in the immune response against infections, including viral infections.Excessive NET release leads to inflammation, tissue damage, and contributes to the cytokine storm observed in severe cases.
Notably, our study emphasizes the importance of quantifying parenchymal abnormalities with the help of an AI-based parameter for the purpose of determining mortality risk.
Many studies have evaluated the diagnostic ability of artificial intelligence against human readers.For example, Murphy et al. found that CAD4COVID X-ray which was trained on 24,678 radiographs, including 1540 radiographs for validation, had a comparable performance against six radiologists 8 .
Another study by Kapoor et al. showed that AI utilization improved triaging system of COVID-19 cases in the emergency department when faced with patients presenting with Flu-like symptoms.Combined with the high resolution CT AI analysis, CXR AI analysis had 97.9% sensitivity and 99% specificity 22 .
In another study, AI parameters derived from CT-scan namely CT-severity score (CT-SS) and affected lung area (%AA) 23 .These parameters associated with poor outcomes such as length of stay, risk of ICU admission, ICU LOS, and risk of mechanical ventilation.The CT-SS had a good predictive ability for ICU admission in COVID-19 patients (AUROC = 0.84; 95%CI 0.79-0.90).
In contrast, the evidence supporting the use of artificial intelligence for disease prognosis is limited.To date, only one published study has specifically investigated this area.In their study, Shamout et al. developed a prognostic model using an artificial intelligence system 24 .The model demonstrated the ability to predict deterioration within 96 h, achieving an AUROC of 0.786 (95% CI 0.745-0.830).To construct the model, the AI system utilized a dataset comprising clinical variables extracted from electronic health records and CXR images.Notably, the model effectively estimated the temporal risk evolution and considered relevant clinical endpoints, such as ICU admission, intubation, and in-hospital mortality.
Comparatively, our model and the model by Shamout et al. differ in terms of the clinical variables incorporated.Their model encompassed a larger set of clinical variables, potentially presenting challenges for replication in developing countries.In contrast, our model included a greater number of clinical variables known to be significant risk factors for COVID-19 outcomes.Consequently, our model may offer more advantages when employed in resource-limited areas.
While the WHO has lifted the pandemic status for COVID-19, this study aims to emphasize the importance of not forgetting the lessons learned from the past.Specifically, the dire need for utilizing artificial intelligence to enhance triaging capabilities, particularly during the peak of the pandemic when resources and manpower were scarce, should be highlighted.

Limitations
This study has several limitations.Firstly, the number of subjects included in both the derivation and validation cohorts is relatively small.Secondly, the study utilized retrospective data, highlighting the need for prospective validation.Thidrly, we did not utilize control group.Finally, we did not analyse the impact of COVID-19 variants on the discriminative ability of the IMPACT scoring system.

Conclusion
Our multicentre study introduces the Integrated Inpatient Mortality Prediction Score for COVID-19 (IMPACT), a clinical risk tool that integrates clinical, laboratory, and CXR data.This study provides valuable evidence on the real-world application of AI in clinical settings, demonstrating its potential for enhancing decision-making and improving patient care.The derivation and validation of IMPACT highlight the transformative role of AI in healthcare, enabling more personalized and effective treatment strategies for COVID-19 patients.Our research aims to bridge the gap between AI advancements and practical use, facilitating wider adoption of AI-based tools and revolutionizing disease prognostication in healthcare.However, it is imperative to conduct prospective validation of our findings, preferably utilizing a control group and extending the application to broader populations.

Figure 1 .
Figure 1.Detection of lung parenchymal abnormality in the chest X-ray with the heatmap method.

Figure 2 .
Figure 2. Variable selection for model construction using the least absolute shrinkage and selection operator (LASSO) binary logistic regression model.(a) LASSO coefficient profiles of the 66 baseline variables.(b) Tuning parameter selection for the LASSO model using tenfold cross-validation and minimum criteria.

Figure 4 .
Figure 4. Area under the receiver-operating characteristic of impact score stratified according to the Admission Year.

Table 4 .
Baseline characteristics of the validation cohort (n[%]; mean ± SD; median [Q1-Q3]).Categorical data are summarized using frequencies and percentages n (%).Continuous data are presented either as mean values with standard deviation (mean ± SD) for normally distributed data, or as median values with the first and third quartiles [median (Q1-Q3)] for data that are not normally distributed.